DESIGN STRATEGIES FOR FILE SYSTEMS 


Stuart E. Madnick 


October 1970 


PROJECT MAC 


MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


Cambridge Massachusetts 02139 


ACKNOWLEDGE BENTS 


The author acknowledges the many long and often heated 
discussions with his colleague, Mr. Allen Moulton, from 
which many of the basic ideas for this file system design 
were molded. 

Many colleagues generously contributed their tine, 
energy, and criticism to help produce this refoert. In 
particular, thanks are due to Prof. John J. Donovan, Prof. 
David Ness, and Prof. Robert M. Grahaasa, as well as, Stephen 
Zilles, Ben Ashton, Hoo-min Toong, Michael Mark, Jcserph 
Alsop, Derek Henderscn, Norm Kohn, and Claude Hans. 

The author's association with MIT Project MAC as well 
as the IBM Cawbridge Scientific Center provided the 
environment and influenced many of the ideas formed in this 
report. 

This report was composed and edited, on-line in the 
CP-67/CMS Time Sharing computer system, with the aid cf the 


SCRIPT manuscript processing system. 


Work reported herein was supported in part by 
Project MAC, an M.I.T. research project sponsored 
by the Advanced Research Projects Agency, Depart- 
ment of Defense, under Office of Naval Research 


Contract Nonr-4102(01). 


DESIGN STRATEGIES FOR FILE SYSTEMS* 


Abstract 


This thesis describes a methodology for the analysis and 
synthesis of modern general purpose file systems. The two 
basic concepts developed are (1) establishment of a uniform 
representation of a file's structure in the form of virtual 
memory or segmentation and (2) determination of a hierarchy 
of logical transofrmations within a file system. These con- 
cepts are used together to form a strictly hierarchical or- 
ganization (after Dijkstra) such that each transformation 
can be described as a function of its lower neighboring 
transformation. In a sense, the complex file system is 
built up by the composition of simple functional transfor- 
mations. To illustrate the sepcifics of the design process, 
a file system is synthesized for an environment including a 
multi-computer network, structured file directories, and re- 
movable volumes. 


*This report reproduces a thesis of the same title sub- 
mitted to the Alfred P. Sloan School of Management and 
the Department of Electrical Engineering, Massachusetts 
Institute of Technology, in partial fulfillment of the 
requirements for the degree of Master of Science, June 


1969. 
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CHAPTER ONE 


Introduction 
Evolution of File Systems 

The evolution of general purpose file systems parallels 
very closely the evolution of operating systems. This is net 
surprising since the concept of file systems grew cut cf the 
embryonic input-output control (I0C) functions of early 
operating systems and now represents the most significant 
component of most modern operating systems, 

There has been very little attention formally directed 
to the specific problem of analyzing operating systems. In 
1967, Saul Rosen collected together material fcr a _ beok, 
“Programming Systems and Languages"<Rosen 67>, which was to 
be a distinctive selection of previously published and 
unpublished reports describing the most iupertant 
programming languages and discussing many of the srost 
important operating system concepts. He was forced to 
concludes 

"The paper on Operating Systems was prepared for 

presentation at the University of Michigan 

Engineering Summer Conference, June 18-29, 1962. 


It has had fairly wide circulation as Rand Report 
P~2584, The material covered has been of vital 
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importance in the development of the "classical" 

operating system, yet it is difficult to find an 

adequate treatment outside of very long and 
usually dry system manuals. George Mealy was one 

of the few working experts inthe field who took 

the time to write down some of the basic 

principles of operating systens and alsc cf 

assembly systezs," 

Mr. Rosen's observations imply that very little 
attention has been expended in the attempt to generalize the 
functions of operating systems. File systems have alsc been 
severely neglected, 

In the early years of computing {roughly 1952-1962), 
progragsers slowly moved away from the practice of 
approaching a bare machine with card decks and sharpened 
pencils, fighting with the console for more or less extended 
periods of time, and leaving triugphantly with final results 
or in defeat with a ream of machine dump<Rosin 69>. 
Operating systems have evolved, not so much as a blessing, 
but as a practical necessity. As computers became faster and 
more complex, it was no longer possible for an individual 
programmer to be an expert in every phase of the prograrsing 
and machine usage; he now must rely on the operating staff 
and system prograurmers to provide the necessities of life. 

These operating systems were often ill-designed and 
usually specialized around a single goal. One of the first 
truly successful operating systems was FMS (FORTRAN Monitor 


System) for the IBM 709/7090/7094 family. Its name implies 


its specialization. As a result a large number of crerating 
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systeas appeared, each with its own operating procedure and 
specialization. These systems were typically very cicsely 
tied to a programming language (e.g. FORTRAN, COBOL, 
Asserbler). 

Input Output Control Systems (IO0CS) emerged as a part 
of the Operating System based on the simple observation that 
all programs perfor# some amount of input and/or output. 
Therefore, rather than requiring each programmer to write a 
new set of input/output routines for each prograa, a cceson 
and sufficiently flerible collection of routines vyere 
supplied with the Operating System. This situation becazse 
especially critical as computer I/O capabilities vere 
extended to include high-speed, buffered, asynchronous 
channels which required complex prograsm logic to efficiently 
perfors input-output. 

From the crude beginnings of I0CS, file systers 
followed a logical, though often slow, evolution. Once all 
physical input/output functions were localized in the I0CS, 
many generalizations became possible. Usually, there is no 
important difference among the nany tape drives available at 
an installation, sc that any arbitrary tape unit may be used 
for input or output to a progras. Furthermore, later runs at 
the same or different installations need not use the same 
unit as long as unique correspondences can be maintained. At 
first it was considered that the best practice in handling 


the choice of input-output units by the object prograr was 
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to include unit assignments as an assembly parameter cr to 
read in unit assignments as data and initialize the progran 
appropriately. This practice worked well when it was 
followed, which was seldom. With the advent of the 
near-uni versal use of I0CS, a more foolproof. and flexible 
manner of operating was to establish the correspondences as 
part of the I0CS. The object prograrss dealt strictly in 
symbolic unit assignments. 

Since the object programs no longer interacted directly 
with the I/O units nor were even aware of unit assignsents, 
additional degrees of freedom became available to the 
operating system, providing a more efficient and convenient 
environment. For example, the system could determine unit 
assignments automatically and dynamically, based upon 
complex criteria such as availability and performance {e.g. 
I/O interference, buffering, etc.). The actual technique of 
I/O (unbuffered, single-buffered, double-buffered, etc.) 
could be removed from programmer concern. 

The proliferation of 1/0 device types, such as 
low-speed, medius-speed, high-speed and hyper-tapes, as well 
as drums and disks of all shapes and sizes, resulted in the 
expansion of I0CS to include capabilities that are now 
called data management or file system facilities. The basic 
notion exploited is that just as the programmer had little 
concern as to what tapes were to be used, he really does not 


care what device is used nor what method of I/O is erfpleyed 
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within broad logical constraints. For example, if a 
programmer wishes logically to treat his IvO data as 80 
column cards, the file system could physically utilize 
unit-record equipment, tapes, disks, drums, data cells, or a 
host of other devices in various manners logically to 
Simulate the effect of input-output using 80 column cards. 

This trend became irreversible with the advent of 
multi-tasking operating systems, since the availability of 
devices was continuously and dynamically changing. In such 
an environment, it becomes impractical and probably 
impossible to designate specific I/O units statically and 
arbitrarily in the progran. 

The importance of these data management and file 
systems cannot be overly emphasized. Just as the assumption 
that programs perform input-output was a basic fact, it 
appears that the number and flexibility of I/0 facilities 
demanded by progrags are continuously increasing. 

A major factor in the rapid growth of file systems is 
the introduction of low cost, high capacity, high-speed, 
direct access devices such as disks, drums, and data cells. 
A description of direct access devices would emphasize the 
fact that they have two degrees of freedom rather than cnly 
one aS with tape-like devices. Since these devices can be 
used for both sequential and direct access applications, the 
total amount of usage increases. Of course, the extra 


degrees of freedo#s necessitate more complex I/0 routines and 
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further tighten the reliance on file systems tc perfors 
these functions. 

Direct access devices are usually as flexible as or 
more flexible than tape devices. Card-image or printer-image 
fixed record data types can he handled as well as 
variable-length or structured data forms. Although these 
capabilities could be performed by the object prcgras, the 
vast sajority of these functions have been subsursed as 
by-products of the file systenr. 

The second wajor factor contributing to the rising 
importance of file systems, as in early operating systenus, 
was necessity. This time it was due to the “"“infcrsation 
explosion". As the number of users, uses, and sophistication 
of use increased, the amount of information in the forrs of 
progrargs and data rose correspondingly. It was ne lenger 
convenient nor usually physically possible to haul the 
required boxes of programs and data to and from the machine. 
This information was converted and saintained ina more 
compact but directly machine processible form, such as 
magnetic tape or disk pack. Not only were the individual 
progragas and data collections large, but the total nusber of 
distinct and unique files (i.e. progrargs and data 
collections) was very large. It is not uncommon for a single 
programmer to have to use from 10 to 100 separate pregrass 
and a roughly equivalent number of data collections. This 


Situation became especially acute with the increased use of 
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online systems. A user at a remote teletype terminal could 
not be expected to re-type and enter all his programs and 
data from the terminal. They aust te permanently saintained 
and stored at the central computer facility, althcugh 
accessable and alterahle under remote terminal ccnatrol. 
Quite obviously, it would be uneconomic and unsganageabie to 
store each unigue file on a separate tape or disk rack. 
Robert Rosin highlights these developments in his recent 
survey of supervisor and monitor systems<Rosin 69>: 

“A file system is especially necessary in any 

systen which purports to provide realistic 

time-sharing. However, the advantages of this 

facility cannot be overlooked in a acre 

conventional environszent". 

Thus, people were faced with the problem of using the 
I/O devices to store thousands of permanent files in 
addition to the traditional use for input, output and 
"scratch" storage. Direct access devices provide the 
capability of storing hundreds or thousands of unique files 
and accessing them in any order conveniently. This tyre of 
direct access device usage results in many side effects. The 
first problem, of course, involves a complex storage 
organization facility to locate “empty” space on the device 
and a directory-like mechanisa to keep track of the 
individual files. Many other facilities are usually 
required, such as a security system to prevent unauthorized 


access to restricted files, and procedures to recover fros 


hardware or software failures. Of course, each installation 
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or group develops additional elegant file systen 
capatilities to seet special requirements or to provide 
extensive flexibility. 

For the same reasons that programmers utilize and rely 
on the file system, the operating system uses the facilities 
of the file system. For example, user identificaticn (e.g. 
passwords, account numbers, etc.), accounting and charge 
information as well as system self-measurement data must be 
maintained dynamically using the facilities of the file 
system. The previously menticned directories of "empty" 
space on direct access devices and the syrbclic file 
directory and access control information are usually handled 
as systea files. The operating system uses the file systen 
capabilities to store the various processing progrags (e.g. 
FORTRAN, COBOL, Assemblers, etc.) as well as many 
infrequently used supervisor routines. Furthermore, advanced 
operating systems perform "spooling", roll-in/roll-out, and 
paging in conjunction with the file system. It is not hard 
to realize that the file system is usually the most 
important component of an operating system in terms of the 
manpower required to develop and implement, and the agcunt 
of instructions and space used by the file systea. 

Whereas the early operating systems along with their 
rudisentary file systems revolved around the need te suprort 
miscellaneous I/0 functions for programming languages, 


modern file systems are at the very center of the operating 
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System. The supervisor, programming systems, and cbject 


pregrams are totally derendent on the file system. 
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Scope and Purpose 

The development of file systems has suffered from rany 
of the same preblems as that of programming languages. 
Probably the single mcst important problem was the excessive 
concern with efficiency. Of course efficiency is important, 
but in most current-day programming Situations other 
factors, such as productivity and flexibility, are finally 
receiving their long-deserved attention. The question of 
efficiency can be put into proper perspective frosz recent 
studies of real programming groups, where it has been fcund 
that the "best" programmer was up to 15 tises sore 
"efficient" than the least proficient programmer. It is not 
the function of this paper to get deeply involved in 
progragsming language controversies, but to illustrate the 
trends and changing attitudes. For example, if the original 
designers of FORTRAN had not felt that its acceptance 
depended on the utmost attention to efficiency and, 
therefore, had not defined the language in terns cf the 
hardware capabilities of a specific machine, IBM 7C4, it is 
possible that the evolution of languages such as FORTRAN-IV, 
COBOL, ALGOL, and PL/I and generalized compiler techniques 
might have proceeded in a sore organized fashion. The entire 
field of generalized approaches tc programming languages and 
compiler techniques has only recently emerged as a nmajor 


factor in the computing profession. 
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File systesas have followed a similar development. In 
the nase of "efficiency", each new file system was specially 
tailored to the original needs and environment of its 
intended use and very seldon could benefit from the 
experience or techniques of preceeding systems. As the 
demands on a given file system increased, new features and 
facilities were added, often with a "crowbar". Each of these 
piece-real file systems drove us further and further from an 
organized, generalized file system structure. 

Most literature in this area has appeared in one of two 
forms. The typical system manuals describe the "clever" 
techniques used to implement a specific file system, but 
provide very little assistance for comparisons with cther 
current systems or iin the design of new file systers. The 
other type of reference deals with discussions of desirable 
characteristics for future file systems, usually emphasizing 
user facilities, but adds little insight into the problems 
of designing and igsplementing such a systen. 

To a certain extent, generalized approaches have hegun 
to evolve in "time-sharing" systems. In this paper such 
Systems will be called conversational resource-sharing, 
since time is only one of many resources that are shared and 
it is the conversational or interactive nature cf these 
systens that is most easily distinguished fron 
batch-oriented operating systenus. 


These generalized file systems for conversational 
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resource-sharing cfperating systems developed bcth by design 
and necessity. In order to provide all the features required 
by user programs and the supervisor, a flexible design was 
essential. Furthermore, owing to the complexity of the 
environment and its dynamically changing aspects, it would 
be impossible to devise an "optimally efficient" strategy. 
The implementers were thus forced to abandon any atterpt to 
make the system acre efficient and were free to develcpa 
flexible system with a clear conscience. 

The goals of flexibility and efficiency need not be 
contradictory. In any multi-tasking system, which includes 
most modern, non-conversational, batch-oriented operating 
systems as well as conversational systems, I/0 oferaticns 
can be performed asynchronously ty channels, and the central 
processor time can be utilized by executing other tasks 
while I/O is in progress. In this environment file systen 
efficiency ceases to be of paramount concern. Furtherrcre, 
individual user attempts to optimize performance could 
result in unnecessary inefficiencies due to conflicts with 
other tasks, such as excessive I/O interference from 
overloading the channels. The file system, aware of the 
total requirements, could provide a strategy that results in 
a more harmonious arrangement, increasing system threughput 
far more than individual user optimization could. 

Even in single-task or application-criented oferating 


systems, there is definite value to an organized, 
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generalized file system. For most large, complex user 
programs as well as compilers and assemblers, the program 
action including precise file system requirements cannct be 
statically determined since it is a dynamic functicn cf the 
input data supplied. Therefore, a dynamically flexible file 
system could often outperform a specialized, but inflexible, 
file systen. 

It is the purpose of this paper to present a general 
file system design. It is extremely important to start with 
a flexible but precise model although this design will 
probably need to be modified and made more detailed fer any 
specific implementation. This issue was highlighted by 
Robert Rappaport in his thesis "Implementing Multi-Process 
Primitives in a Multiplexed Computer System"<Rapp €&> which 
describes the development of the Traffic Controller for the 
MIT Project MAC Multics System: 


"After having found acceptable solutions for 
the problems at hand, one asks oneself why it tcck 
so long to arrive at these solutions and was there 
any way to have done it more quickly? One might 
further ask if the arrived-at solutions are in any 
sense optimum? 

After being involved in designing a large 
system involving the work of many people, one gets 
the feeling that such problems as were encountered 
here are bound to crop up. The development cf any 
large syster can only remain manageable if 
distinct parts of the system remain modular and 
independent, 

Without a theory of computing systems to fall 
back on, designing such complex systems becomes an 
art, rather than a science, in which it is 
impossible tc prove the degree to which working 
solutions to problems are in any sense optigur 
solutions. In much the same way as authors write 
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books, large coaputer systems go through several 
drafts before they begin to take shape. In the 
absence of a theory one can only cope with the 
complexity cf the situation by proceeding in an 
orderly fashion to first produce an initial 
working model of the desired system. This part of 
the work represents the major effort of the design 
and isplementation project. Once having arrived at 
this benchmark, many of the problems may then be 
seen ina clearer light and revisions to the 
working model are implemented such sore quickly 
than were the original aodules. As to the 
development of a theory, one gets the impression 
that it will be a long time in coming." 


Therefore, while we await THE general theory of 
computer science, the file system model presented in this 
paper will hopefully serve the need for an “initial working 


model" from which "problems may be seen in a clearer light". 
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CHAPTER TWO 


Motivation Behind File System Design 


There are two basic goals to be satisfied by the file 
system design. It is necessary to (1) establish a uniform 
representation of a file's structure and (2) detersine the 
hierarchy of logical transformations that occur in a file 
system. W. R. Henry's recent paper on hierarchical data 
management systers<Henry 69> discusses similar noticns cf 
separating logical and physical file control, but differs 
Significantly from the approaches presented in this report 
in many fundamental ways. It should be a useful reference to 


a reader interested in other current research in this area. 


Uniform Representation of File Structure 

A typical computer system is portrayed by Figure 2.1. 
Such a configuration usually has a varied asscrtrent of 
secondary storage devices in addition to the prisary 
storage. Programs and data must he in primary stcrage in 
order to be executed or operated upon, respectively. 


It is generally true that if primary storage size was 


limitless and very inexpensive, there would be no need for 
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secondary storage (possible exceptions may be backup 
requirements and transfer of data). In the framework of this 
report, the file system will he defined as the software 
mechanism that extends the capacity of primary stcrage by 
handling and coordinating the transfer of informaticn tc and 
from the secondary storage devices. This definiticn is 
somewhat more restrictive than other common interpretations 
which include as fart of the file system the physical 
devices or the programs that operate upon the data. In this 
interpretation the file system merely stores and transfers 


information but does not operate upon it. 
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Physical Computer Configuration 
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Early file systems were usually designed tc crerate 
with specific applicaticn programs. Since there are 
potentially a very large number of different secondary 
storage devices, many of which can be used in more than one 
way (e.g. sequential or random access, blocked or unblocked, 
etc.) each file system limited itself specifically to those 
devices and organizations that were appropriate fcr its 
interded application. Figure 2.2 depicts the relaticnshifrs 
between the applications, the devices, and the file systems. 

This type of development produced chaotic situations. 
It is somewhat analogous to assembly language programming 
without any established standard calling sequences or 
communication conventions, which makes it difficult, if not 
impossible, to use arbitrary programs as sukreutines. In 
particular, it was quite common to find that data files 
produced by the payroll programs, using their private file 
system, could not be accessed by the file system used by the 
personnel programs, and vice versa. As a result, there was 
much duplication of effort and confusion in the development 


and use of these early file systems. 
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Figure 2.2 
Early File Systeas 


(Analogous to Assembly Language Programming) 
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More recently, the computer manufacturers and operating 
system designers realized that it is possible to select a 
small set of cosson logical file organizations {or access 
methods) that can satisfy the needs of most application 
programs. Furthermore, these access methods could be 
designed in a flexible manner to operate ona variety of 
different devices and device organizations. This rprevided 
the user with a logically device-independent interface with 
the file system. Figure 2.3 illustrates this structure. 

This appreach can be compared with the energence cf 
Probles Oriented Languages, such as COBOL for business 
applications and FORTRAN for scientific applications. The 
access methods file systems suffered the same shortcomings 
as the programming languages: (1) despite claims, they were 
not really device independent, (2} occasionally it was 
necessary to resort to assembly language to overcose cr 
bypass a restriction, and (3) it was not possible to 
inter-mix access sethods (analogy would be to intermix 


FORTRAN and COBOL subroutines). 
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Figure 2.3 
Access Methods File Systems 


(Analogous to Early Programming Languages, 
Such as FORTRAN and COBOL) 
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In order to overcome the weakness in the access sethcds 
approach, it is necessary to design a single unifcre file 
representation that can (1) be used for every application 
and (2) be device independent. This idealistic goal is 
analogous to the search for "The" universal programming 
language, for which PL/I is probably the most asbiticus 
attempt to date. 

It is reasonable to expect that such a unifcrr 
representation will be so atomic or primitive in fcrs that 
it will be desirable tc construct more powerful specialized 
access methods for the convenience of the typical user. 
Since the access methods are built upon the uniform 
representation, it is much easier to modify or isplement new 
access methods or, if necessary, operate at the atomic level 
to bypass the restrictions of the access rsetheds. This 
approach pushes the logical/physical separaticn of file 


system structure much further as indicated in Figure 2.4. 
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Figure 2.4 
Uniform File Representation 


(Analogous to Universal Prograsring Language, 
PL/I ?) 
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The 
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rationale behind the selection of a particular 


uniform representation is not trivial. For example, there 


are three broad classes of common uniform representaticns: 


1. 


Streams ~ every file is treated as a coentinuous 
sequential stream of information. It is possible 
to access only the current position in the stream 
or reposition to the beginning of the stream. This 
representation can be implemented conveniently on 
almost all secondary storage devices, although it 
does not provide the user with very fovwerful or 
efficient features for many applications. 
Direct-Access - every file is treated as an 
ordered collection of items. Each item is directly 
accessable by means of a unique identifier 
corresponding to its position in the ordering. 
This representation, which corresponds to primary 
storage organization, is sore powerful than 
Stream, but is very difficult to izplezent on 
intrinsically serial devices, such as sagnetic 
tape. 

Associative —- every file is treated as an 
unordered collection of items, each item is 
directly accessible by means of an identifier that 
has been “associated" with the item. This is a 
very flexible representation. Unfortunately, 


except for a small class of sophisticated 
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secondary storage devices the implementation is 
very cosplex and inefficient. 

Irregardless of the specific uniform representaticn 
chosen, the important concept is that all files can he 
viewed as being identical in structure independent of the 
particular physical device on which the file is recorded. 
This generalization is depicted in Figure 2.5, which should 


be compared with Figure 2.1. 
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ierarchy of Logical Transformations 


——— 


Although a precise description of a file system will 
not te presented until later sections, there are several 
general characteristics of most file systems. In 
particular, a user specifies his request, such as_ read or 
write, by designating a file and an element within the file. 
Most advanced file systems allow considerable flexibility in 
the mechanism used to specify a file, it is typically 
described by means of a symbolic file name. Furthermore, the 
element within the file is specified in terms of the logical 
representation of elements in the particular file systern 
which may or may not correspond to a precise fhysical 
specification cf hcw and where the element is stored. Por 
example, a typical reguest might be of the fora: 

"Read item 23 from file ALPHA into location 1564." 

Realizing that information must usually be stored on 
devices in somewhat obscure ways, there must he sone 
sequence of transformations required to ccnvert the user's 
request into its final form that physically operates cn the 
secondary storage device. Quite often the transformaticn is 
viewed as a single step but that is a gross 
oversisplification that hides the fundamental mechanisms in 
use. In Figure 2.6 the conversion process is illustrated in 
terms of a discrete sequence of logical transformations. 


Since the specifics of these transformations may not be 
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obvious until the more detailed secticns later, a sirple 
analogy is presented in Figure 2.6 that loosely parallels 
the file syster transformations. The analogy is only 
intended to provide some insight into the rationale behind 
each stage of the transformation. 

The process starts from the user's request to "read 
item 23 from the file ALPHA into location 1564", The first 
step is to convert the symbolic file name into a unique 
numeric file identifier. In the analogy, this ccrresfonds to 
looking up John Dee's identifier which is a social security 
in this illustration. The purpose for using an identifier is 
basically the sase in both cases. It is usually more 
convenient to store information, manually or automatically, 
by means of a unique numeric “key® rather than a symbolic 
name which may, under certain circumstances, not even he 
unigue (i.e. there may be more than one John Doe in which 
case other factors must be considered in order to uniquely 
identify the person under consideration). 

The file identifier can then be used to conveniently 
access all the informaticn known about a file, this 
information collectively is known as the file's descriptor. 
In the analogy, this would correspond to requesting all 
information in the social security records of 030-34-1234. 

Now that everything is known about the file, it is 
necessary to consider the specific operaticn te be 


performed. Using the file descriptor, a sequence cf logical 
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I/O commands can he produced. These are called logical I/0 
commands because they do not consider the physical 
characteristics of the secondary storage device to be used. 
This is analogous to putting an address on a letter which is 
usually done without considering the physical destination 
nor the route to be taken, 

In order to complete the transformation, the logical 
I/O commands must be converted into the appropriate sequence 
of physical 1/0 commands. This conversion may be trivial or 
complex depending upon the peculiarities of the device and 
I/O interfaces to the devices. In the analogy this process 
is performed at the post office where the address is used to 
determine the physical routing needed to get the letter to 
its destination. 

The final step in the process is the physical transfer 
of information. This is usually performed by means of 
software /hardware interactions to activate the apprcefriate 
device and confirm the successful completion of the request. 
Of course, in the analogy this transfer is acccarplished by 
the postman ("neither rain nor snow nor dark cf night...") 


assisted by trucks, planes, trains and other automaticn. 
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CHAPTER THREE 


File System Design Model 


Basic Concepts Used In File System Design 

Two concepts are basic to the general file systen model 
to be introduced. These concerts have been described by the 
terms "hierarchical modularity" and "virtual memory", They 


will be discussed briefly helow. 


Hierarchical Modularity 

The term "modularity" means many different things to 
different people. In the context of this paper we will be 
concerned with an organization similar to that propesed by 
Dijkstra<Dijks 67><Dijks 68> and Randell<Rand 68>. The 
important aspect of this organization is that all activities 
are divided into sequential processes. A hierarchical 
structure of these sequential precesses results in a level 
or ring organization wherein each level only communicates 
with its immediately superior and inferior levels. 

The notions of "levels of abstraction" or “hierarchical 
modularity" can best be presented briefly by an example. 


Consider an aeronautical engineer using a matrix inversion 
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package to solve space flight problems. At his level cf 
abstraction, the computer is viewed as a matrix inverter 
that accepts the satrix and control information as input and 
provides the inverted matrix as output. The application 
programmer who wrote the matrix inversion package need not 
have had any knowledge of its intended usage (superior 
levels of abstraction). He might view the computer as a 
"FORTRAN machine", for example, at his level of abstraction. 
He need not have any specific knowledge of the internal 
operation of the FORTRAN system (inferior level cf 
abstraction), but only of the way in which he can interact 
with it. Finally, the FORTRAN compiler isaplementer operates 
at a different (lower) level of abstraction. In the above 
example the interaction between the 3 levels of abstraction 
is static since after the matrix inversion program is 
completed, the engineer need not interact, even indirectly, 
with the applications programmer or compiler isplementer. In 
the form of hierarchical modularity used in the file systen 
design model, the sulti-level interaction is ccntinual and 
basic to the file system operation. 

There are several advantages to such an smodular 
organization. Possibly the most important is the logical 
completeness of each level. It is easier for the systen 
designers and implementers to understand the functions and 
interactions of each level and thus the entire systea. This 


is cften a very difficult preblem in very complex file 
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systems with tens or hundreds of thousands of instructions 
and hundreds of inter-dependent routines. 

Another by-product of this structure is "debugging" 
assistance. For example, when an error occurs it can usually 
be localized at a level and identified easily. The complete 
verification {reliability checkout) of a file system is 
usually an impossible task since it would require tests 
using all possible data input and system requests occuring 
in each potential "system state". In order to construct a 
finite set of relevant tests, it is necessary to ccnsider 
the internal structure of the mechanisa to he tested. 
Therefore, an important goal is to design the internal 
structure so that at each level, the number of test cases is 
sufficiently small that they can all he tried without 
overlooking an important situation. In theory, level 0 would 
be checked-out and verified, then level 1, level 2, etc., 
each level being more powerful, but because of the 
abstractions introduced, the nurpber of "special cases" 


remains within bounds. 


Virtual Memory 

There are four very important and difficult file systen 
objectives: (1) a flexible and versatile format, (2) as much 
of the mechanism as possible should he invisible, (3) a 
degree of machine and device independence, and (4) dynamic 


and automatic allocation of secondary storage. There have 
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been several techniques developed to satisfy these 
objectives in an organized manner; the concept exploited in 
this generalized file systen has been called 
"segmentation"<Denn 65> or "named virtual memory"<Daley 68>. 
Under this system each file is treated as an crdered 
sequence of addressable elenents, where each elerent is 
normally the sase size unit as the main storage, a byte or 
word. Therefore, each individual file has the form of a 
"vyirtual”™ core mesory, from whence the name of the technique 
came, The size of each file is allowed to be arbitrary and 
can dynamically grow and shrink. There is no explicit data 
format associated with the file; the basic operations of 
the file system move a specified nuskter of elesents between 
designated addresses in “real" memory and the "virtual" 
memory of the file systen. 

There are several reasons for choosing such a_ file 
concept. In scme systems the similarity between files and 
main storage is used to establish a single mechanism that 
serves as both a file system for static data and program 
storage and a paging systes<Lett 68><Daley 68><Denn 68><Salt 
68> for dynamic storage management. "Virtual memory" 
provides a very flexible and versatile format. When 
specific formatting is desired, it can be accomplished by 
the outermost file system level or by the user pregras. For 
example, if a file is to be treated as a collection of 


card-image records, it is serely necessary to establish a 
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routine to access 80 characters ata time starting at byte 
locations 0, 80, 160, «+. . Almost all other possible 
formats can be realized by similar procedures. 

Except for the formatting modules, the entire file 
system mechanism, including allocations, buffering, and 
physical location, is completely hidden and invisible tc the 
user. This relates closely to the objective of device 
independence. In many file systems the user sust specify 
which device should be used, its record size {if it isa 
hardware forsatable device), biocking and buffering factors, 
and sometimes even the physical addresses. Although the 
parameters and algorithsas chosen might, in some sense, be 
optimal, many changes might he necessary if the program is 
required to run with a different configuration or 
environment. This strategy does not prevent the user fron 
providing additional information, such as how often the file 
will be used and in what manner. The important factor is 
that this information is not necessary and its significance 
is determined by the file system rather than the user. 

There are very serious questions of efficiency raised 
by this file system strategy. Most of these fears can be 
eased by the following considerations. First, if a file is 
to be used very seldon (as in program developsrent), 
efficiency is not of paramount importance; if, on the cther 
hand, it is for long-term use (aS in a commercial preducticn 


program), the device-independence and flexibility for change 
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and upkeep will be very important. Second, by relieving the 
programmer of the complexities of the formats, devices, and 
allocations, he is able to utilize his energy sore 
constructively and creatively to develop clever algorithrs 
relating to the lcegical structuring of his problerg rather 
than clever "tricks" to overcome the shortcomings or 
peculiarities of the file system. Third, in view of the 
complexity of current direct-access devices, it is quite 
possible that the file system will be better able to 
coordinate the files than the average user atteupting to 


specify critical parameters. 
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Of File System Design Nodel 


—— 


The file system design model to be presented in this 
paper can be viewed as a hierarchy of seven levels. In a 
specific implementation certain levels may be further 
sub-divided or combined as required. A recent study of 
several modern file systems, which will be published ina 
separate report, attesapts to analyze the systers in the 
framework of this basic model. In general all of the systers 
studied fit into the sodel, although certain levels in the 
model are occasicnally reduced to trivial form or are 
incorporated into other parts of the operating systen. 

The seven hierarchical levels are: 

1. Input/Output Control System {I0CS) 

2. Device Strategy Modules (DSM) 

3. Allocation Strategy Modules {AS) 

4. File Organization Strategy Modules (FOSM) 
5. Basic File System (BFS) 

6. Logical File System (LFS) 

7. Access Methods and User Interface 

The hierarchical organization can be described from the 
"top" down or from the "bottom" up. The file systes weuld 
ordinarily be implemented by starting at the lowest level, 
the Input/Output Control System, and working up. It appears 
more meaningful, however, to present the file system 


organization starting at the most abstract level, the access 
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routines, and removing the abstractions as the levels are 
"peeled away". 

In the following presentation the terss "file nane", 
"file identifier", and "file descriptor" will be introduced. 
Detailed explanations cannot be provided until later 
sections, the following analogy may be used for the reader's 
assistance. A person's name (file name), due to the somewhat 
haphazard process of assignment, is not necessarily unique 
or manageable for computer processing. A unique identifier 
(file identifier) is usually assigned to each person, such 
as a Social Security number. This identifier can then be 
used to locate efficiently the information (file descriptor) 


known about that person. 


Access Methods {AM) 

This level consists of the set of routines that 
superimpose a format on the file. In general there wiil 
probably be routines to simulate sequential fixed-length 
record files, sequential variable-length record files, and 
direct-access fixed-length record files, for example. Many 
more elaborate and specialized format routines, also called 
access methods or data management, can be supplied as part 
of the file system. Obviously, a user may write his own 


access methods to augment this level. 
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Parameters and Data Pases Used by File Systen 
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Logical Pile System (LFS) 

Routines above this level of abstraction associate a 
symbolic name with a file. It is the function of the Logical 
File System to use the symbolic file name to find the 
corresponding unique "file identifier". Below this level the 


syubolic file name abstraction is eliminated. 


Basic File System (BFS) 

The Basic File System must convert the file identifier 
into a file descriptor. In an abstract sense, the file 
descriptor provides all information needed to physically 
locate the file, such as the "length" and “*locaticn" of the 
file. The file descriptor is also used to verify access 
tights (read-only, write-only, etc.), check read/write 
interlocks, and set up system-wide data bases. The Basic 
File System perforss asany of the functions ordinarily 
associated with "opening" or “closing® a file. Finally, 
based upon the file descriptor, the appropriate FOSM for the 


file is selected. 


File Organization Strategy Modules (FOSH&) 

Direct-access devices physically do aot resemble a 
virtual memory. A file aust be split into many separate 
physical records. Fach record has a unique address 
associated with it. The File Organization Strategy Module 


maps a logical virtual memory address into the corresponding 
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physical record address and offset within the record. 

To read or write a portion of a file, it is necessary 
for the FOSMH to translate the logically contiguous virtual 
memOry area into the correct collection of physical reccrds 
or portion thereof. If necessary, new records are allocated 
by the ASM. The list of records to he physically processed 
is passed on to the appropriate DSM. 

Although not necessary, the FOSM is often designed to 
allocate "hidden" file buffers in order to minimize 
redundant or unnecessary 1/0. If the requested portion of 
virtual memory is contained in a currently buffered record, 
the data can be transferred to the designated user sain 
storage area without intervening 1/0. Conversely output to 
the file may be buffered. If a sufficiently large number of 
buffer areas are allocated toa file, it is possible that 
all read and write requests can be performed by merely 
moving data in and out of the buffers. When a file is 
"closed", the buffers are esaptied by updating the physical 
records on the secondary storage device and released fcr use 
by other files. Buffers are only allocated to files that are 


actively in use (i.e. "open"). 


Allocation Strategy Modules (ASS) 
The Allocation Strategy Modules keep track cf the 
available records on a device. They are responsible for 


allocating records for a file that is heing created or 
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expanded, and deallecating records for a file that is being 
erased or truncated. The FOSM requests that a record be 
allocated when needed, the ASM actually selects the record. 
Quite frequently, the ASM functions are incorporated 
into either the FOSM or DSM. In this paper these functions 
will be kept as separate as possible by explicitly 


recognizing the separate ASM level. 


Device Strategy Mcdules (DS¥) 

When a large portion of a file is to he read or 
written, many records must be processed. The Device Strategy 
Module considers the device characteristics such as latency 
and access time to produce an optimal I/0 sequence free the 


FOSM and ASM requests. 


Input/Output Centrol System (IOcSs) 

The InputsOutput Control System coordinates all 
physical I/O on the computer. Status of all outstanding I/0 
in process is maintained, new I/O requests are issued 
directly if the device and channel are available, ctherwise 
the request is queued and automatically issued as sccn as 
possible. Automatic error recovery is attempted when 
possible. Interrupts from devices and unrecoverable error 
conditions are directed to the appropriate routine. Almost 


all nodern operating systems have an IOCS. 
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File Systems versus Data Management Systems 

In the literature there is often confusion between 
systems as described above, which this paper calls "file 
systens" and systems which will te called “data management 
systems", such as DM-1<Dixon 67>,  GIM-1<Nel 67>, and 
TDMS<Blei 67>. The confusion is to ke expected since bcth 
types of systems contain all of the functional levels 
described above. The systems differ primarily cn the 
emphasis placed on certain levels. 

In general file systems, the file is ccensidered the 
most important ites and emphasis is placed on the directory 
organization (Lcgical File System) and the lower 
hierarchical levels. It is expected that specialized access 
methods will be written by users or supplied with the systen 
as needed. 

In most data management systems, the individual data 
items are considered the most important aspect, therefcre 
emphasis is placed on elaborate access methods with minimal 
emphasis on the lower levels of abstraction. Because of the 
heavy emphasis cna single level, data management systems 
tend te appear less hierarchical than file systems since the 


lower levels are often absorbed into the access methods, 
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Access Methods 


The virtual wemory interface provided by the Logical 
File System allows for very flexible user applications and 
access methods. In a PL/1-like notation, calis to the 
Logical File System are of the form: 

LFS_Read/wWrite (Pilename, Addr1, Addr2, Number) ; 
where Addr1 is the main storage address, Addr2 is the file 
virtual memory address, and Number is the number of elements 
to be moved. 

In this paper elements will be assumed to be 8-bit 
bytes. For exasple, a request to read 100 bytes from 
location 200 within the file named ALPHA into main storage 
location 1234 could be expressed: 

LFS_Read{*ALPHA®, 1234, 200, 100); 

Sequential fixed-length records, sequential 
variable-length records, and direct-access fixed-length 
records are common access methods. All cf these 
organizations and sany more can be realized using a file's 
virtual memory. Note that the records processed by the 
access methods are "software" records and have no relation 
to the physical/logical records processed by the FOSM and 


DSM. 
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Sequential and Direct-Access Fixed-Length Record Access 
Methods 

To simulate these access methods, the file's virtual 
memory is treated as a sequence of records of the desired 
length, lL. 

To access these records sequentially, a fpositicn 
counter, PC, is set aside that starts at © and is 
incremented by L after each read or write. The position 
counter therefore finds the location of the next sequential 
record. The routine could be written as; 

LFS_ Read (Filename, Location, PC, L)3; 
PC = PC + L; 

To access these records by direct-access there is no 
need for a position counter since the desired record, r, can 
be found at location (r-1)*L in the file's virtual mesory. 
This reutine could be written as: 


LFS_Read(Filenaze, Location, {r-1)*L, 1); 


Sequential Variable-Length Record Access Method 

The Seguential Variable-Length Record Access Method 
treats the file as an ordered sequence of records, each 
record may be a different length. This method can he 
isplemented by preceeding each record with a “hidden" length 
field. 

These records can be accessed using a variation cf the 


Sequential Fixed-Length scheme. For example: 
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LFS_Read (Filename, L, PC, 4); /* Get 4 byte 
length */ 
LFS_Read (Filename, Location, PC+4, L); /* Get 
data */ 


PC = PC +L + 4; y* Update position counter */ 


Other Access Methods 

The above examples were presented to illustrate the 
ease with which conventional access methods can be supported 
under this file system design. The real importance of the 
virtual memory ccncept is not its ability to provide 
traditional access methods, but the ease and flexibility 
with which problem-oriented access methods can be developed. 
The programmer is able to design access methods based cn the 
needs of his problem rather than forcing his’ problen 
solution to be ccnstrained by a small set of limited access 
methods. For example, Nelson<Nel 65> discusses some flexible 
and complex file structures that can he used "as an adjunct 
to creativity". 

The power of a computer reaches its peak when it is 
capable of amplifying the creativity of the programmer. A 
system that restricts the programmer's ability to express 


his ideas provides him questionable service. 
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A user's program references each file by means of a 
unique symbolic nage. It is the function of the Logical File 
System to convert the symbolic name reference into its 
corresponding unique file identifier. The Logical File 
System performs the mapping using a “file directory 
organization". 

In the simplest case the file directory is entirely 
stored in main storage as a two-entry table. The two entries 
are the symbolic file name and its corresponding file 
identifier. A look-up routine is all that is needed to serve 
the function of the Logical File System. This approach is 
used by several file systems because of its simplicity and 
efficiency. Unfortunately, the nusber of files that are 
allowed in the file system is restricted by the asount of 
main storage available for the file directory. 

To remove the abcve lisitation, many file systems keep 
the file directory on secondary storage. The file directory 
can ke treated as a standard file if its file descriptor is 
always known. This allows the file directory to be 
processed, expanded, and truncated using the normal file 
system mechanisms. The Logical File System mapping still 
involves a table look-up, only this time the table is 
contained in a file's virtual memory rather than sain 


storage. The calls to the Basic File System are essentially 
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the same as the calls to the Logical File System, only a 
file identifier is specified rather than a symbolic file 
name. 

A few of the advanced file systeas have introduced the 
concept of the hierarchical file directory. From a simple 
point of view, a file directory hierarchy reseables and 
serves a similar purpose to a PL/1 data structure. In 
practice, certain files are classified as "directories" in 
addition to their normal attributes. The earlier sodel of 
the Logical File Systern implied that there was only one 
directory file. This file contained the file identifiers for 
all the other files, called “data files". fhis has been 
extended to allow the hase directory, often called the "root 
directory", to contain file identifers for directory files 
as well as data files. Fach subsequent directory file can 
contain file identifers for other directory files as well as 
data files, 

Figure 3.8 illustrates a file directory hierarchy. The 
files A, B, C, and D are directory files, all the others are 
data files. The data files, as well as directory files, do 
not necessarily have unique syrbolic nares. There are 3 data 
files in Figure 3.8 named "x", as in PL/1 this ambiguity is 
solved by using qualified names such as "A.xX", "A.B.D.X", 
and ®Aa.C.r*, 

The file directory hierarchy serves many purposes in 


additicn to providing flexible and versatile facilities for 
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Figure 3.8 
Hierarchical Pile Directory Example 
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progragmer usage. "File sharing" and "controlled access" 
among users are very closely tied to the hierarchical 
directories. Certain of these features are discussed in the 
paper by Daley and Neumann<Daley 65>. A more detailed 
treatment of this topic will be presented in a subsequent 
paper by this author. 

The implementation of the Logical File Systerx for a 
file directory hierarchy is a sisple extension of the singie 
directory technique. After finding the correct file 
identifier in the root directory, it is either the data file 
desired or, if a secondary directory file, is used in 
exactly the same manner as the root Aivecters identifier to 


advance one more level in the hierarchy. 
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As explained in the Overview section, a file is 
physically located on secondary storage as an ordered 
collection of distinct records. The information that 
describes a file's size, access rights, device address or 
addresses, and the mapping algorithm must be maintained by 
the file systen, 

In a simple file system this information can be 
incorporated inte the file directory as long as there is a 
unigue one-to-one mapping of file name onto file. In a 
sophisticated file system with features such as (1) 
hierarchical file directory, (2) aliases that allow a single 
file to be referenced by different names, (3) links that 
allow a file to be referenced from various directories in 
the file hierarchy or from different users, and (4) 
removable or detachable "volumes" or devices, the unigue 
mapping cannot be guaranteed. 

To produce an unambiguous file systema, the file 
directory information is divided into three parts, the file 
name, identifier and the descriptor. The file name 
directories are the mappings between a symbolic file nase 
and the corresponding identifer. The precise locaticns of 
the file descriptors can differ for different 
implementations, but uniguely defined by the identifer. In 


fact, since the file descriptors usually need not be 
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searched, they need not be contiguous. Usually they are 
collected in either (1) a special system wide file, (2) a 
collection of files, each located on a separate device or 
volume, or (3) hidden within the symbolic file name 
directories. 

Although it is usually not possible to keep the 
symbolic file directories in main storage, the number of 
files actively in use is sufficiently small that the 
correspond ing file descriptors can he placed in a 
core-resident table called the Active File Directory or Open 
File Directory. 

It is the function cf the Basic File System to use the 
unique file identifier te locate the file descriptor and 
place it in the Active File Directory unless it has already 
been "opened". The Basic File System also checks that the 
action requested upon the file such as read, write, cr 
delete does not violate the restrictions specified in the 
file descriptor. 

After verifying legal access to the file, the Basic 
File System passes control to the appropriate File 
Organization Strategy Module as specified in the file 


descriptor entry. 
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The primary function of the File Organization Strategy 
Module is to map a file's virtual memory address ontc a 
corresponding physical record number, There are at least 
three common physical file organization strategies: 


sequential, linked, and indexed. 


Sequential File Organizaticn Strategy 

The Sequential File Organization Strategy is used by 
most of the older, simpler, and non-dynasic file systems. 
Under this technique logically consecutive reccrds are 
physically consecutive. For example, if each reccrd is 1000 
bytes long, virtual address 3214 would be located in the 
fourth logical record. If the first logical record {i.e., 
the one containing virtual address 0) is physical record 
120, the record containing virtual address 3214 would be 
physical record 123. 

There are two notable advantages claiwmed for this 
technique. Firstly, the mapping is very Sisple and 
efficient. The only information needed is the fixed record 
size and the address of the first record. Secondly, if the 
file is to be processed in a sequential manner, the 
consecutive organization allows for minimizing device 
latency and access time. 


Although the first point is indisputable, the second 
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Claimed advantage is open to question. If there is nore 
than one file on the same device that is actively in use, as 
is common ina multi-tasking ervironment, then the device 
read/write positioning will be switching rapidly among the 
active files, defeating the assumed sequential accessing. 

The major disadvantage of this sequential organization 
is that the maximum size of the file must be assuned 
statically before creating the file. By specifying too 
small a size, the task will ke forced to terminate if sore 
space is needed. If too large asize is assused, as is 
common, there is much wasted space and fragmentaticn. 

This technique may be reccmrazended for single-tasking 
systems with few permanent files and very few files 
Simultaneously in use. It might be useful for a large 
information utility system which is based on a large nuaber 
of independent, low cost, low usage, high capacity devices 
such as data cells where wasted space is not a significant 


probler. 


Linked File Organization Strategy 

The Linked and Indexed Pile Organization Strategies 
allow for files to dynamically grow and shrink. The linked 
technique was probably developed first since it is sispler 
and ernphasizes sequential characteristics which were 
primarily used in early file systeas. 


The linked organization requires each record of a file 
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to specify the location of the next logical record, 
analogous to the "links" on a chain. The file descriptor 
specifies only the location of the first record. It tells 
nothing about the locations of the other records. As the 
file grows, new records are dynamically allocated and linked 
onto the file. 

For sequentially processed files, the linked technique 
provides a very simple and efficient mechanism. A few bytes 
are used in each record to record the link, and since record 
sizes are usually in the range of 1000 bytes the overhead is 
Binizal. Unfortunately, random or direct-access file usage 
poses serious problems. If, for exarple, the last access 
was to a data area in logical record 5, a reference to an 
area in logical record 15 will require 9 intermediate I/0 
accesses to find the links before reaching the desired 
record. The Linked File Organization Strategy has been used 
Satisfactorily on systeas where the vast majority of files 


are accessed sequentially. 


Indexed File Organization Strategy 

The Indexed File Organization Strategy is a 
significant variation to the linked technique. Records are 
dy nargica lly allocated as needed, but rather than 
distributing the record addresses throughout the file as 
links, they are collected together asa table. The logical 


record number is used as an "index" in the table to find the 
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corresponding physical record number. 

Tf files are limited tc swall or medium sizes, the 
index table can be stored as part of the file descriptor. 
If files are aliawed to be arbitrarily large, the index 
table must itself be treated as a file and is broken into 
separate records. In the former case, sequential and random 
access processing proceed easily and efficiently. In the 
latter case, sequential processing is very efficient, except 
for intermittent accesses for the next portion of the index 
table. Random processing may te very efficient if localized 
to a simple index table block; in any case it will never 
exceed a small nurber of intermediate accesses, usually one 
or two, for totally random processing. 

The Indexed File Organization Strategy has the 
advantage of allowing the concept of a “sparsely filled" 
file. If we assume that each physical record is 1000 bytes 
and each index (record number) is 4 bytes, then the index 
table for a file that is 250,000 bytes long would require 
250 indexes or 1000 bytes. By designating a special cede, 
such as 0, to indicate an index for a non-allocated recerd, 
a file can be created with specific contents at lecations 
10,900, 40,000, and 287,000 but with unspecified contents 
elsewhere. By convention, unspecified contents are usually 
initialized as zero by the file system. The above sparse 
file would only require four physical records, three records 


for the specified portions of the file and one record for 
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the index table. As more information is written inte a 
sparse file, more physical records will be allocated as 
needed, 

The indexed organization provides a simsple and 
efficient way to use programming techniques, such as “hash 
coding" or "random entry" tables, that require a large 
though sparse virtual menory. 

Many of the most recent file systers have adopted 


techniques similar to the Indexed Pile Organizaticn. 


64 III. FILE SYSTEM DESIGN MODEL 


When the FOSM maps a valid write request onto a logical 
record for which a physical record has not been allocated, 
the ASH is called to find an available record for use. There 
are two common techniques used to keep track of available 
records. The first technique links all available reccrds 
together. This sethod is often used in conjunction with a 
Linked File Organization Strategy Module. The second 
technique uses a "bit map" for each device. A bit map is a 
function which operates on a bit string and describes the 
relationship between a bit position and a physical record on 
the device. For example a convenient bit map might be: bit 
0 corresponds to physical record 0, bit 1 to physical record 
1, etc. If a bit is set to 0, the corresponding record is 
available for allocation, otherwise it has already heen 
allocated to a file. The hit aap provides a very compact 
representation cf the allocation inforsaticn. The 
allocation states cf a device with a capacity of 8,0€C,000 
bytes divided into 8000 1000-byte records can be stored in a 
1000 byte bit map. Ina file system with a large number of 
high-capacity direct-access devices, it may be impossible to 
keep all the bit maps in main storage. The bit sap pay be 
subdivided into sections, such as a separate bit sap for 
each group of 800 records. Only one section of the bit sap 


for a device is kept in main storage at a tire, the 
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remaining sections are left stored on the device. 

Since sequential processing is a very common file 
usage, the ASH may attesapt to allocate records tc take 
advantage of this fact. Of course, any specific File 
Organization Strategy Module and Device Strategy Module 
group are expected to be cooperative with the Ailccation 
Strategy Modules to optimize overall perforsance, The 
precise nature cf seaningful cooperation would be too 


detailed to discuss in this paper. 
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In addition to the obvious “read" and "write" 
functions, direct access devices often require additional 
T/O commands, such as "seek® and "search", for proper 
positioning. The FOSM and ASM deal only with the logical act 
of of reading and writing. They transfer a set cf requests 
to the DSM of the form: “read record 24 into lecaticn 5400, 
read record 49 into location 6400, and write reccrd 27 fron 
location 9324", The DSM must translate these requests into 
the obscure I/O list format required for the particular 
device, 

Furthermore, due to the device characteristics such as 
latency and access tire, the order in which the requests are 
performed affects the total amount of time that the device 
is kept "busy". For example, if records 24 and 27? are 
"closer", in some sense, to each other than record 49, it 
Might be more efficient to read record 24, write record 27, 


and then position to read record 49. 
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Input/Cutput Control Syste 

The Input/Output Control System coordinates all the 
physical I/0 on the computer. On most modern computers there 
are complex interdependencies among the physically 
independent 1/0 devices. Usually this dependency occurs due 
to the dedicated nature of "selector" channels and device 
control units that can switch to any device but can cnly 
service one device at a time. For very high-speed devices, 
such as drums, the asain storage access time can be an 
important factor. If toc many simultaneous memory requests 
occur, "overrun" can occur resulting in erroneous data 
transmission. The IOCS keeps track of the status of all 
devices, control units, and channels. When an I/O operation 
is requested, the I0CS checks to insure a clear path tc the 
device through the channels and control units and that no 
I/O capacity limits will be exceeded. If it is not feossible 
to issue the requested I/0 operation, the IOCS stores the 
request on a queue. The I/O will be issued at a later time 
when all conditions are Satisfied. Since the Iso 
interdependencies may exist among ali devices, every I/0 
operation whether for the file system or dedicated special 
purpose device must be funnelled through the IOCS. 

Although most modern I/0 devices are very reliable, 
spurious errors do occur. Usually the retry cr reccvery 


procedure is very simple, in such a case the I0CS will 
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attempt corrective measures. 

The caller te the IOCS is informed of the status of his 
I/O request, for example {1) successful completion, (2) 
unrecoverable error condition, or 3) asynchronous 
interrupt. 

The sophistication and scope of the IOCS depends upon 
the devices to be handled and the goals of the file systen 


and operating systear. 
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CHAPTER FOUR 


Nulti-Cosgputer Network Environment 


Background 


A general file system design sodel must, of course, be 
modified and elaborated to satisfy the needs of any specific 
desired file systen environment. To illustrate the 
refinement process, a unique file systes design will be 
presented for a multi-computer network. 

Multi-computer networks are beconaing an increasingly 
important area of computer technology<Mad 68>. There are 
several Significant reasons behind the growth of 
multi-computer networks: 

1. fo increase the power of a computer installation 
in a modular manner, especially if {a) it is not 
possible to acquire a larger processor, ({(b) 
reliability is important, or ({(c) there are 
real-time or time-sharing constraints. 

2. To serve the co-ordination requirements of a 
network of regional computer centers. 

3. To support the accessibility to a nation-wide data 


base. 
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An example of the environment to be considered for this 
paper can be illustrated in Figure 4.1. This type of 
multi-computer network has been in lisited use for several 
years in many configurations. The IBS 7094/7044 
Direct-Coupled System<Rosen 69> was probably cne cf the 
earliest practical examples of such an inter-connected 
arrangement, 

There are several implicit constraints imposed upen the 
multi-computer systegz illustrated in Figure 4.1: 

1. Independence of Central Processors. 

Each of the central processors operate independently 
such that there are no direct processor-to-processor 
data transfer nor signaling, and furtherscre there 
is no "master" processor. 

2. Non-shared Memory. 

Rach central precessor has its own main storage 
unit. These units are not shared with nor accessed 
by another central processor. 

3. Inter-locked Device Ccntrollers, 

The device controllers act as "traffic cops" to the 
actual I/O direct access devices. They control the 
traffic between a computer's I/O channel and a 
selected I/0 device. A single device centrcller will 
only accept requests from one channel at a time and 
will only select one I/O device (among those under 


its control) ata time. Once a device controller 
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Example of Multi-computer File System Network 
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connects a channel with a device, the connection 
remains intact until the channel] releases the device 
or an I/O error occurs. 

The environment described above, although well within 
the boundaries of current technology, has not heen the 
subject of much investigation. Such ccenfigurations are 
presently very expensive and, therefore, chosen cnly for 
very specialized situations. Even then there are only two or 
three processors and very specialized software and 
operational factors. A discussion of the CP-67/CMS Tine 
Sharing System <IBM 68a><Sea 68> will serve to establish the 
relevance of the nulti-computer network environsent. 

The CP-67/CMS Time Sharing System uses the special 
hardware features of a single IBM Systen/360 arodel 67 
processor augmented by software to produce an apparant 
environment corresponding to the sulti-computer network 
illustrated in Figure 4.1, with many independent central 
processors, device controllers, and direct access I/0 
devices. In practice a typical single processor 360/67 
configuration would produce the affect of about 30 active 
processors ("virtual" System/360 model 65 processors each 
with a 256,000 byte memory) and 50 active device 
controllers. More detailed descriptions of the CP-67/CMS 
System can be found in the References. In the traditional 
sense of time-sharing, each user of the CP-67/CMS System is 


provided with a "virtual" computer operated fros a sirnulated 
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operatcr console (actually an augmented remote terminal). 
Most importantly, each "virtual" computer (i.e. user) 
operates logically independently of all other "virtual" 
computers except for the specified inter-connected I/0 


devices and device ccontrcllers,. 
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Problems Arising In Multi-Computer Networks 


There are Bany probleas associated with the 
multi-computer file system network. Some of these problenas 
are wnique to this environment. Other problers have been 
solved in traditional file systems<Corb 62><Salt 65><Scie 
68>, but the solutions require major revisions due tc the 
peculiarities of the environment. The most significant 
probless are listed briefly below. 

1. No shared semory. 

Usually file systems co-ordinate the status of the 
files and devices by using main storage accessable 
tables and data areas that describe file status, 
access rights, interlocks, and allocation. There is 
no such common communication area in main stcrage 
that can be accessed by all the independent 
processors. 

2. No inter-ccaputer communication. 

Multi-computer configurations usually provide a 
mechanism for sending signals or data transfers 
between the separate processors. With this 
capability the non-shared msemory problem cculd be 
solved by either ({a) electing one processor tec be 
the "gsaster" processor that coordinates the cther 
processors, or (b) supply alli the processors with 


enough irfcrmation such that each processor knows 
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4. 


what all the other processors are doing. The concept 
of a "“gaster™" processor opposes the intended 
homogeneous, independent processor assumption. The 
possibility of supplying status information to all 
other processors, although reasonable for a three or 
four processor configuration, was not considered a 
feasible solution for a system with hundreds of 
processors and devices and thousands of files. For 
these reasons, inter-coaputer cosmunication, 
although an available capability, was not included 
as a required capability of the saulti-computer 
environment described above. 

No pre-arranged allocations. 

For small specialized smulti-computer file networks, 
each processor can be “assigned™ a specific area of 
a device or set of devices that can he used to write 
new files, all cther processors can only read frors 
this area by convention. This prevents the danger of 
two independent processors writing files at the same 
place. Such an "arrangement" is not practical for a 
large, flexible multi-cogputer file network since 
the static assignment of secondary storage space 
does not take account of the dynanic and 
unpredictable requirements of the independent 
processors. 


Extendable device and file allocation. 
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The number of devices and sizes of devices as well 
as the nusber and sizes of files are, within reason, 
unlimited, For example, a specific amount cf 
secondary storage equivalent to 100,000 card images 
could be used to hold 10 files of 10,000 card each 
or 1,000 files of 100 cards each. This consideration 
discourages techniques that result in a strong 
efficiency or main storage capacity dependency on 
the "size and shape" of the file system. Of ccurse, 
the magnitude of the file system size will affect 
the operation, but arbitrary restrictions such as 
"no more than 64 files on a device” would be 
discouraged unless essential. 

Removable volumes, 

It has become comron to differentiate between the 
I/O mechanism used to record or read information, 
called a "device", and the physical medius on which 
the information is stored, called a "volusze". For 
most drums and many disk units, the device and 
volume are inseparable. But, for magnetic tare units 
and many of the smaller disk units the volume, 
Magnetic tape reel and disk pack respectively, are 
removable. It is intended that the file system 
include files that are On unmounted volumes 
{disconnected from an 1/C device) as well as mounted 


volumes. Therefore, a configuration that consists of 
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ten disk units may have ae file systes that 
encompasses hundreds cf volumes, only ten of which 
may be actively in use at a time. Since removing and 
mounting a volume takes several minutes of manual 
effort, it will be assumed that the “working set" of 
volumes (volumes that contain files that are 
actively in use) remains static for reascnable 
periods of time and is less than or equal tc the 
number of devices available. The fact that volumes 
are resovable and interchangeable (i.e. Bay be 
mounted cn different devices at different times) 
does affect the organization of the file system. For 
example, a scheme that involved linking files 
together by means of pointers (chained addressing) 
could require mounting volumes just to continue the 
path of the chain even though little or no "legical" 
information was requested from files on that volume. 
In the worst case, it might be necessary to mount 
and unmount all the volumes of the file system to 
lecate a desired file. Such a situation should 
definitely be avoided if not totally eliminated by 
the file system. 

Structured file directories and file sharing. 

In a traditional file system, the mapping between 
the symbolic file name and the corresponding file 


was accorplished by means of a single Master File 
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Directory. For modern file systems with thousands of 
files scattered over hundreds of volumes, it became 
desirable, if not necessary, to form grcufings of 
files by means of Secondary File Directcries<Daley 
65>. These groupings are often used by the system to 
asseciate users with files they own (User File 
Directories). This capability is also available to 
the user to arrange his files into further 
sub-groups (libraries) or into separate 
project-related groupings. Occasionally it becomes 
necessary for a file to ke included in twe cr nore 
groupings (e.g. accessible by more than cne User 
File Directory) with potentially different access 
privileges (protection) associated with each 
grouping. Many of these features that are relatively 
easy to implement in a traditional file system are 
complicated by the introduction of independent 
processors and removable voluses. 

Fail-safe operation. 

Reliable operaticn is a very impertant requireszent 
of a general purpose file system. There are sany 
known techniques for I/O error and systematic backup 
and salvage procedures that are applicable to this 
environment. The important problem associated with 
the multi-computer network is that potential error 


conditions exist that are not normally feund in 
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traditional single computer file systems. For a 
Single computer system, a processor error (including 
unexpected processor disconnection, i.e. "turning 
off") is a rare occurrence. Such a situaticn is 
remedied by repairing whatever physical hardware is 
necessary and then running a special "salvager" 
program tc bring the file system into a well-defined 
operational state. In the environment of a 
Bulti-computer network, processors may be connected 
or disconnected at any time without any awareness hy 
the other processors. To prevent any inconsistent 
file system operation by the other processors and 
eliminate the need for usually time-ccnsuming 
salvage techniques, it is necessary to keep the file 
system ina well-defined consistent state at all 


times, 
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The purpose of the remainder of this paper is tc apfrly 
the crqanization presented in the File System Design Mcdel 
section to sclve the problems associated with a 
multi-cogputer file system network. Discussion of the Access 
Methods and Input/Output Comtrol System will be omitted. 
This is necessitated for brevity and consideration of the 
facts that the Access Methods are highly application 
oriented, as discussed in a previous section, and that the 
Input/Cutput Control System is usually a hasic and ccgacn 
component of all Operating Systens. The principal 
contribution of this model lies in the structure of the five 


other levels. 


Logical Fi 
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To present the goals and reguirements of the Logical 
File System in a brief and demonstrative manner, an example 
will be used. The reader should refer to Figure 4.2 for the 
following discussion. It is important that the peculiarities 
of the example, such as the choice cf file names (e.g. 
"FILE6" and "DIR4"), not be confused with the general 
characteristics of the Logical File Systen. 

In Figure 4.2, there are 12 files illustrated. 
Associated with each file is an identifier of the form 


"VOL1(3)". The usage of this identifier will not be 
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Figure 4.2 
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Example of File Directory Structure (to LFS) 
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discussed until later, in the meanwhile notice that each 
file's identifier is unique. The 12 files are divided into 2 
types, directory files (i.e. VO1L1(3), VOL2(3), VOL3{2), and 
vOL3(5)), and data files (i.e. VOL1(2), YVOL1 (6), VO1L1(4), 
VOL1(5), VOI2(4), VOL2(2), VOL3 (4), and V0OL3(3)). fhe 
distinction between directory files and data files is only a 
matter of usage, the Access Methods pay operate upecn a 
directory file in the sase sganner as a data file, 
furthermore, all lcwer levels {e.g. Basic File System) treat 
all files as data files. This factor will be elaborated 
shortly. 

It is the stated function of the Logical File System to 
map a file name reference into a unique file identifier. 
This mapping is a function of the requested file name 
(symbolic file name path) and a starting point {base 
directory) in the file directory structure. In Figure 4.2, 
three exasple base directories are illustrated by 
associating VOL1(3) with user 1, VOL2 (3) with user 2, and 
VOL3(2) with user 3. Therefore, user 1 references to the 
file name FILE2 yields the file VOL1 (4). 

A more complex example can be illustrated by 
considering the file VOL3(4). User 3 can refer to this file 
under the name FILES. Alternatively, it can be referenced by 
the name DIR3.FILE7. The file DIR3, which is associated with 
VOL3({5) from user 3's base directory, is interpreted as a 


lower level directory. Then from file VOL3(5), the file name 
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FILE7 is mapped into VYOL3(4) as intended. The file VOL3(4) 
can be referenced from user 2's base directory as DIR3.FILE8 
or DI#3.DIR3. FILET, for example. From user 1*s base 
directory, the file VOL3(4) can be referenced as FILE3, 
DIR2.DIR3. FILES, DIR2.DIR3.DIR3.FILE?, or even 
DIR2.DIR3.DIR4Y.DIR3.DIR3.PILE?. 

Two important side affects of the base file directory 
and file name path facilities are that (1) a specific file 
may be referenced by many different names, and {2) the same 
name may be used to reference many different files. 

The headings VOLUME “VOL1", VOLUME "VOL2", and VOLUME 
“VOL3" are intended to indicate that the 12 files are 
scattered over 3 separately detachable volumes: VOL1 
{containing VOL1(2), VOL1(3), YOL1(4), vVOL1(5), and 
VOL1{6)), VOL2 (containing VOL2(2), VOL2(3), and VCL2(4)), 
and VOL3 (containing VOL3(2), VOL3(3), VOL3(4), and 
VOL3(5)). If volume VOL2 were detached from the system, user 
1 could still reference VOL1(4) as FILES and VOL3{4) as 
FILE3, but could not reference YVOI3 (4) as DIR2. DIB3. FILE8 
nor VCL1(5) as DIR2.DIR3.LCIR3.FILF6 since the path weuld 
logically require passing through volume VOL2. Furtherrore, 
user 3 is allowed to erase (i.e. remove from file systen 
structure) the file VOL3(4) under the name FILES, assuring 
appropriate protection priviledges, whether or not volune 
VOL1 is mounted in spite of user 1's reference to file 


VOL3(4) under the name FILE3. 
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The Logical File Syster could be extremely ccrplex if 
it had to specifically consider the physical addresses of 
volumes, the device characteristics, and the location of 
file directories cn volumes, in addition to its obvious 
requirement of searching file directories. These problems 
are eliminated by introducing the file identifier and the 
interface with the Basic File Systenr. 

The Basic File System processes requests that specify a 
file in terss of a file identifier consisting of a vclune 
name and index, such as {¥OL3,4}, rather than a file nase. A 
sample call from the Logical File System to the Basic File 
System, in PL/I-like notation, is: 

CALL BFS_READ(VOLUME,INDEX,CORE ADDR,FILE_ADDB,COUNT) ; 
where VOLUME is the name of the volume containing the file, 
INDEX is the corresponding unique index of the file, 
CORE ADDR is the main storage address into which data is to 
be read, FILE_ADDR is the file virtual memory address fror 
which the data is to be read, and COUNT is the nureber cof 
bytes to be transpitted. Using these features, the heart of 
the logical File System (ignoring opening and closing files, 
file access protection, illegal file names, etc.) reduces to 
the FL/I-like code presented in Figure 4.3. It is assused 
that the file name has been broken down inte an array of 
path element nares {e.g. if name is DIR2.DIR3.FILES, then 
PATH (1)="DIR2', PATH(2)="DIR3",  PATH{3)="FILES*, and 


PATH_LENGTH=3), that BASE_VOLUME and BASE_INDEX initially 
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specify the (VOLUME,INDEX) identifier of the base directcry, 
and that each entry in a file directory is N bytes long and 
formatted as indicated in the FILE_ENTRY declaration. 

For efficiency, the names of all files that are 
actively in use (usually a small fraction of all files in 
the system) are kept in main storage in an Active Nane 
Directory (AND). The AND is searched before accessing the 
file directories cn secondary storage. Entries are deleted 
from the AND when the corresponding file is "closed" or 
"deleted", 

Of course, the handling of access (protection) rights, 
errors, and other responsibilities will make the Logical 
File System much more coaplex, but it is important tc note 
that the design and implementation of the Logical File 
System escapes all physical file organization and device 


characteristic considerations and complexities. 
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DECLARE 1 PILE_ENTBY, 
2 FILENAME CHARACTER (8), 
2 VOLUME CHARACTER (8), 


2 INDEX FIXED BINARY, 


DO I = 1 TO PATH_LENGTH; 
DO) OE 6J = O BY N WHILE (FILE_ENTRY.FILENAME += PATH{1I)) 5 
CALL BFS_READ (BASE_ VOLUME ,BASE_LTNDEX, FILE_ENTEY,J*N,N) ; 
END; 
BASE_VOLUME = FILE_ENTRY. VOLUME; 
BASE_INDEX = FILE_ENTRY.INDE X; 


END; 


Figure 4.3 


Example Procedure to Perform Logical File System Search 
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The Basic File System must convert the file identifier 
supplied from the Logical File System into a file descriptor 
than can be processed by the File Organizaticn Strategy 
Module. A file descriptor contains information such as the 
volume name, physical location of the file on the velure, 
and the length cf the file. Every file must have an 
associated file descriptor, but since the number of passive 
files (i.e. not actively in use) might be very large, the 
file descriptors are maintained on secondary storage until 
needed (i.e. file is "opened"). In organizing the secondary 
storage maintenance of the file descriptors there are 
several important considerations: 

1. There must be a unique file descriptor fcr each 
file regardless cf how often the file appears in 
file directories or what symbolic names are used. 
This is required to maintain consistent 
interpretation of a file's status. 

2 The file descriptor information for a file must 
reside on the same volume as the file. This is 
reasonable since if either the file or its 
descriptor is not accessable at some time by the 
system (i.e. unmounted) the file cannct be used, 
this possibility is minimized hy placing them cn the 


same volure,. 
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3. In the same manner that the Logical File System 
was Simplified by using the facilities of the lower 
hierarchical level, the file descriptors should he 
maintained in a manner that allows the File 
Organizaticn Strategy Module to process thes as 
normal files. 

These probless are solved by the use of the Volume File 
Descriptor Directory (VFDD). There is a single VFDD for each 
volume, it contains the file descriptors for all files 
residing on the volume. The file descriptors are cf fized 
length and are located within the VFDD pesiticnally 
according to the corresponding file identifier's index. In 
order to exploit the facilities provided ty the File 
Organization Strategy Module, the VFDD can be processed by 
the lower levels as a normal file. It is assigned an unique 
file identifier consisting of the volume nase and an index 
of 1, in fact the file descriptor for a VFDD is stored {when 
not in use) as its own first entry. Figure 4.4 presents 
diagrammatically the logical file structure of Figure 4.2 
with the added detail of the Volume File Descriptor 
Directories and File Directory formats. 

For efficiency, the descriptor's of all files that are 
actively in use are stored in an Active File Directory 
(AFD). The APD is searched before accessing the Volume File 


Descriptor Directcry. 


BASIC FILE SYSTEM LESIGN 
He ee oe ee + 
1 +-------~- 1 te ------ + 
=>) 29010171 | +-> 1Z¥OE1(2) /1 
RGSS | VALLE A 
yout (1) I >>>>>>> ioe | teone----- 
#--------~- + ] 
VOL1(2)] >>>>>>> 1----+ 
#---~----- + 
TORII >>>>>>> | room nner een 
alaseatetenieientented + 
vou 1¢ 4 } D>>>>>> [oor m rete rrr nnn 
t--------- + 
i ee >>>>>>> Jem beeen + 
aaa + +> 1ZVOLI(S)/1 
vou1(6) | >>>>>>> J --+ VASAS4////\ 
~-------- + | $-~-------+ 
VFDD £66 "VOLI" | 
H ten = 2sS+== + 
+--->| ZVOL1(6)/I 
VSSSSS1 S111 
Sc oatenieae eaten 2 
$--~------- +--+ + 
| t—---------+ | $---------+ 
> 1Z¥OIZ(N I 1-1 00L2131/1 
eee # YY VWSSS11A7/7/4 
voLz( 1 >>>>>>> fort Po teen nn----+ 
+--------- + 1 
VOL Z(2) J] >>>>>>> J----+ 
+o enone + 
VOL Z(3) | >>>>>>> [eee -n-- oo 
tere + 
voR2em >>>>>>> fo---# 0 $--------- + 
races a= + >> 12012 (4) /1 
veDD. for "VOL2" V\AS////7/7/7/\ 
4—-+-----——-+ 
+--+ + 
| .enete eee + J 
~~? ZOE 3 (V/ 1 {i gree pe Too =—= 
ASO See 1] 
vou3 (1)] >>>>>>> 3 | 
t-- + ! 
VOL3 (2)] >>>>>>> J[----+ 
+------~--- rr 
¥On319 74 >>>>>>> J------ > 179013371 
Ener eres = WIMIMEETE 
VOL 3 cyt D>>>>>> Joram te  $--------- 
+---+----- + | 
vent >>>>>>> J--+ | 
Tees as * 11 
vPDD. fOr "VOLS": poy easseeeee= + 
} +->) ZVOL3{4)/| 
| VSSSISS S11 
| $--—------ + 
| 
I 
Fw ee ee ee ee 
Figure 4.4 


89 


J] VOL3 (4) 

+‘----- ee 
| VOL2 (3) 

$-----—~-—- = 
} VOu1{4) 

$a + 
} VOL1(6) 
$e ann eee + 
VOL1(2) 
$o-— == + 


~>1ZVOLI(4)/1 
bees 


— we ee ae oe ee 
ee ae ee ee ae ee a ee ee ee me cee ae ee 


DIR3(D) | VOL3(2) 


ee oe ee ee eee we eee ewe 


} VOL2 (2) 
+ —_——we ee SS ee ee 
} VOL2 (4) 
$--------- + 


ee ee eee a ee ee oe ee oe oe ee ee ee ee 


ee a ee oe ee ee oe 


ee ee eee ae ae ae ee ee ae ee ee ae ee ee ee 


ea ee ee ee ee ee 


FILE6 
e—--- - -- 
] FILE2 
awe eee eee 
| FILE? 


Example of File Directory Structure (to BFS) 


| VOL1 (5) 


+— +— + 


90 IV. MULTI-COMPUTER NETWORK ENVIRONMENT 


The File Organization Strategy Module processes 
requests that specify a file in terms of a file descriptor 
{the entry extracted from the VFDD) rather than a file rane 
or file identifier. A sample call from the Basic File Syster 
to the File Organization Strategy Module, in PLy/sI-like 
notation, is: 

CALL FOSN_READ {DESCRIPTOR, CORE_ADDR, FILE_ADDR,COUNT) ; 
where CORE ADDR, PILE_ADDR, and COUNT have the same 
interpretation as discussed above. 

The primary function of the Basic File System reduces 
to the single request: 

CALL FOSM_READ(VFDD_DESCRIPTOR, DESCRIPTOR, M¥ {INDEX-—1) ,B) 3 
where VFDD_ DESCRIPTOR is the descriptcr cf the VFDD 
associated with the volume name supplied by the Logical File 
System as part of the file identifier, INDEX is from the 
specified file identifier, M is the standard length cf a 
VFDD entry, and DESCRIPTOR is the desired file descriptor. 

The Basic File System performs several other tasks, 
such as protection validation and maintenance of the 
core-resident Active File Directory that enables efficient 
association between a file's identifier and descriptor for 
files that are in use {i.e. “open"). But, as in the Logical 
File System, the domain of the Basic File Systez is 
sufficiently small and narrow that it remains a conceptually 


Simple level in the hierarchy. 
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File Organization Strategy Modules 

The Logical File System and Basic File System are, toa 
great extent, application and device independent. The File 
Organization Strategy Modules are usually the most critical 
area of the file system in terms of overall perfcrmance, for 
this reason it is expected that sore than one strategy may 
be used ina large systes. Only one strategy will be 
discussed in this section, the reader may refer to the 
papers listed in the References<Corb 62><Mad 68b><Salt 
65><Scie 68> for other possible alternatives. 

The FOSM must map the logical file address onte a 
physical record address ocr hidden buffer based upon the 
supplied file descriptor information. In the simplest case, 
the msapping could be performed by including a two-part table 
in the file descripter. The first part of each entry would 
indicate a contiguous range of virtual file addresses, the 
second part of each entry would designate the corresponding 
physical record address. It has been assumed, however, that 
all file descriptors have a specific length, whereas the 
mapping table is a function of the file*s length and is 
potentially guite large. Therefore, it is not feasible to 
include the entire mapping table as part of the file 
descriptor. One cf the most powerful file organizaticn 
strategies utilizes file maps, Figure 4.5 illustrates such 


an arrangement. 
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In this example it is assumed that each file is divided 
into 1000 byte physical records. A file can be in one of 
several states depending upon its current length. If the 
file's length is in the range 1 to 999 bytes, the file 
descriptor contains the address of the correspending 
physical record. If the file is hetween 1000 and 499,999 
bytes long, the file descriptor specifies the address cf a 
file map located on secondary storage. Each entry of the 
file map (assumed to require 2 bytes) designates the 
physical address of a block of the file (blocks are ordered 
by virtual file addresses: 0-999, 1000-1999, 2000-2999, 
etc.). Furthermore, for files greater than 500,000 bytes, 
but less than 250,000,000 bytes, there are 2 levels of file 
maps as illustrated. 

This strategy has several advantages. Under the worst 
conditions of random access file processing only frcem one to 
three I/0 operations need to be performed. By utilizing 
several hidden buffers for blocks of the file as well as 
file maps, the number of I/0 operations required for file 


accesses can be drastically reduced. 
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The function of allocation and deallocation of blocks 
involves several separate factors. Before describing the 
implementation of the mechanisms, it is wise to review the 
desired characteristics: 

1. A file is allowed to grow in size, the FOSM will 
request additional blocks from the ASM fer the 
data portions of a file or its index tables, as 
needed. 

2 Common direct access devices contain from 8000 to 
32000 separately allocatable blocks, thus it is 
not feasible to store all allocation infcrmaticn 
in main storage. 

36 Since twe independent processors may be writing 
new files on the same volume at the same time, it 
is necessary to provide interlocks such that they 
do not accidently allocate the same block to sore 
than one file, yet not require one processor to 
wait until the other processor finishes. 

These problems can be solved by use of a special Volune 
Allocation Table (VAT) on each volume. In this scheme, a 
volume must be subdivided into arbitrary contiguous areas. 
For direct access devices with movable read/write heads, 
each discrete position {known asa "cylinder") covers an 


area of about 40 to 160 blocks. A cylinder is a reasonable 
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unit of subdivision. For each cylinder on the volume, there 
is a corresponding entry in the VAT. Each entry ccntains a 
“bit map" that indicates which blocks on that cylinder have 
not teen allocated. For example, if a cylinder consists of 
40 blocks, the bit map in the corresponding VAT entry would 
be 40 bits long. If the first bit is a "0", the first block 
has not been allocated; if the bit is a "1", the block has 
already been allocated. Likewise for the second, third, and 
remaining bits. 

When the FOS™ first requests allocation of a blcck cn a 
volume, the ASM selects a cylinder and requests that the DSM 
read the corresponding VAT entry into main storage. An 
available block, indicated by a "0" bit, is located and then 
marked as allocated. As long as the volume remains in use, 
the VAT entry will be kept in main storage and blocks will 
be allocated on that cylinder. When all the blocks on that 
cylinder have been allocated, the updated VAT entry is 
written out and a new cylinder selected. With this technique 
the arount of main storage required for allecaticn 
information is kept to a minimum (about 40 to 160 bits per 
volume), at the same time the number of extra I/O cferations 
is minimized (abcut one per 40 to 160 blocks of allecaticn). 

The problem cf interlocking the independent processors 
still remains. As long as the processors are allocating 
blocks on different cylinders using separate VAT entries, 


they may both proceed uninterrupted. This condition can be 
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accomplished by utilizing a hardware feature knewn as "keyed 
records" available on several computers including the IBM 
System/360. Each of the VAT entries is a separate record 
consisting of a physical key area and a data area. The data 
area contains the allocaticn information described above. 
The key area is divided into two parts: the identification 
number of the processor currently allocating blocks on that 
cylinder and an indication if all blecks on that cylinder 
have been allocated. A VAT entry with a key of all zerces 
would identify a cylinder that was not currently in use and 
had blocks available for allocation. 

There are I/O instructions that can be used by the DSM 
that will automatically search for a record with a specified 
key, such aS zerc. Since the device controller will not 
Switch processors in the midst of a continuous stream of I/0 
operations from a processor (i.e. "chained I/C commands"), 
it is possible to generate an uninterruptible sequence of 
I/O commands that will (1) find an available cylinder by 
searching the VAT for a entry with a key of zero and (2) 
change the key to indicate the cylinder is in use. This thus 


solves the multi-processor allocation interlock preblen. 
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Strategy Medules 

The Device Strategy Modules convert "logical I/0 
requests" from the File Organization Strategy Modules and 
Allocation Strategy Modules into actual computer I/C command 
sequences that are forwarded to the Input/Output Control 
System for execution. 

When a request to transfer a large portion cf a file 
(10,000 bytes for example) is issued, it is unlikely that a 
Significant amount cf the needed blocks are in hidden 
buffers. It will, therefore, be necessary to request I/0 
transfer for several blocks (e.g. about 10 biocks if each 
block 1000 bytes long). The FOSM will generate logical I/0 
requests of the form: “read block 227 into location 12930, 
read block 211 into location 13930, etc." The DSM must 
eonsider the physical characteristics of the device such as 
rotational delay and "seek" position for movable heads. It 
then decides upon an optimal sequence to read the blecks and 
generate the necessary physical I/0 command sequence 
including positicning commands. The Input/Output Centrol 
System actually issues the physical I/0 request, error 
retry, and cther housekeeping as discussed earlier. The 
detailed strategy for choosing the optimal I/O sequence is, 
of course, very device dependent and will not be elaborated 


here. 
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Other Considerations 


The preceeding sections have highlighted the framework 
of a file system. There are, of course, many other important 
decisions to be made in such a system, such as the format 
and organization of tables, error conditions<Lock 68>, 
measurement and accounting mechanisms, etc. Cne of the 
subtle points will be discussed in this section. 

The Basic File System is intended to deal with files 
represented by unique identifiers. In the specific system 
presented, the identifier is designated as the tuple, 
<volume, index in VFDD>. This representation resulted ina 
very efficient mechanism for accessing a file's descriftor 
that avoided much of the time-consuming table lock-up. 
Unfortunately, this representation is not temporally unique. 
It has heen assured that when a file is deleted, the VFDD 
index position used for that file's descriptor is available 
for use by new files that may he created. This would not he 
a problem if all instances of the deleted file's identifier 
were removed from the system at the same time, but there may 
be more than one path to the file due to links frem cther 
symbolic file directcries. The strategy used by the Basic 
File System did nct provide any convenient means tc lccate 
all references (i.e. links) to a specific file. Furthermore, 
even if such a mechanism existed, it would not solve the 


problem since the reference may exist in a file directory 
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that is located on a volume that is not physically mcunted 
or accessable by the system at the time of deleticn. 
Therefore, in such an environment, it is possible to have 
links in directories that identify files that have keen 
deleted. The danger exists that the following sequence of 
events may occur: (1) a file is created and assigned 
identifier, <ALPHA,5>, (2) a link is made to that file, (3) 
the file is deleted by its creator, (4) a new file is 
created and coincidently assigned the identifier <ALPHA,5>, 
and (5) the link previously created is used not realizing 
that the intended file has been deleted and replaced by scre 
other arbitrary file! 

Fortunately, this dilemna is not irrevocable, there is 
a multitude of solutions. Two simple variations would be (1) 
never reuse VFDD entries but allow the file to centinually 
grow but become "sparse" or (2) maintain ccunt of the number 
of links to a file and reuse the VFDD entry only when all 
links have been reroved. A better solution can be formulated 
by attacking the original goal of generating truly unigue 
file identifiers. The Multics Operating System has sinilar 
requirements, it forms unique identifiers by concatenating 
the central processor's unique serial number with the 
chronolog clock time with accuracy in the range of 
microseconds. A much Simpler scheme can be incorporated into 
the file system by associating a separate counter with each 


volume. Whenever a new file is created on avelume and 
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assigned a  VFDD entry, the value of the corresponding 
counter is incremented by one. For the purpose cf the file 
system, the tuple, <volume, counter value>, is a unique 
identification of a file. 

The counter value, which monotonically increases, 
cannot be efficiertly used as a direct index inte a finite 
size file descriptor directory. A minor modification to the 
Basic File Systes design can incorporate the ideas of the 
above discussion. The file identifier can be constructed 
from the triple, <volume, VFDD index, ccunter valve>, In 
this context the counter value will be called a "key", since 
its sole purpose is to verify that the accessed VFDD entry 
is correct by attempting to "unlock™ the entry (i.e. 
comparing the key from the VFDRD entry with the key from the 
symbolic file directory which was copied from the VFDD when 
the link was initially established). 

The above problems are typical of the factors that must 
be considered by file system designers. The general file 
system model will very seldom he a complete description of a 
specific implementation and it certainly will nct replace 
the need for systems analysts, but it can save many acnths 


of the initial design! 
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Tc a large extent file systems are currently developed 
and implemented in much the same manner as early “horse-less 
carriages", that is, each totally unique and “hand-made" 
rather than "mass produced". Compilers, such as FORTRAN, 
were once developed in this primative manner; but due to 
careful analysis of operation (e.g., lexical, syntax, and 
semantic analysis, etc.), compilers are sufficiently well 
understood that certain software companies actually cffer 
“do-it-yourself FORTRAN kits". Since modern file systers 
often cutweigh all other operating system components such as 
compilers, loaders, and supervisors, in terms of programmer 
effort and number of instructions, it is important that a 
generally applicable methodology be found for file system 
development. 

This paper presents a modular approach to the design of 
general purpose file systems. Its scope is broad encugh to 
encompass most present file systems of advanced design and 
file systems presently planned, yet basic enough to be 
applicable tc more modest file systems. 

The file system strategy presented is intended to serve 
two purposes: (1) to assist in the design of new file 
systems and (2) to provide a structure by which existing 


file systems may be analyzed and corpared. 
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